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Abstract 

The present contribution suggests the use of a multidimensional scaling (MDS) al- 
gorithm as a visualization tool for manifold- valued elements. A visualization tool of this 
kind is useful in signal processing and machine learning whenever learning/adaptation 
algorithms insist on high-dimensional parameter manifolds. 
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1 Introduction 

In machine learning, signal processing and intelligent data analysis, data to analyze and 
adaptive systems' parameters are oftentimes organized as vectors or matrices whose en- 
tries satisfy non-linear constraints. Exemplary cases of interest and related applications are 
summarized below: 

• Symmetric positive-definite matrices find a wide range of applications. For instance: 
Analysis of deformation [351 HI]) image analysis [3], statistical analysis of diffusion 
tensor data in medicine f23| , automatic and intelligent control ff] , pattern recognition 
[40l [50] , speech emotion classification as well as design and analysis of wireless 
cognitive dynamic systems |28j : 

• Calculation of the center of mass of Lie-group- valued and manifold- valued data collec- 
tions [innnnn]; 

• Several applications deal with special orthogonal group connection patterns like, for 
instance: invariant visual perception [33], modeling of DNA chains [321 ISSj, auto- 
matic object pose estimation [15], distributed consensus optimization among agents 
[32] , study of plate tectonics [31] , blind source separation and independent component 
analysis [34] , curve subdivision in nonlinear spaces [42l [53] ; 

• There is a number of signal/data processing algorithms that learn parameter- vectors on 
the unit-hypersphere for applications such as blind channel deconvolution [TT] [TH |3S] , 



one- unit independent component analysis [30], robust constrained beamforming [IB] 
and data classification by linear discrimination based on non-Gaussianity discovery 

• Stiefel-manifold-based algorithms have become increasingly popular in the scientific 
literature, with applications to optimal linear data compression, noise reduction and 
signal representation by principal/minor component analysis and principal/minor sub- 
space decomposition (TJ [TSl |3I1 ED , smart sensor arrays O |32 , direction of arrival esti- 
mation [TJ |35] , linear programming [5] , electronic structures computation within local 
density approximation (e.g. for understanding the thermodynamics of bulk materials, 
the structure and dynamics of surfaces and the nature of point-defects in crystals) [13] , 
factor analysis in psychometric |14) ; 

• Non-compact manifolds play a role in signal/data processing and modeling. A promi- 
nent example is the real symplectic group, whose known applications range from quan- 
tum computing [IJ [5S] to control of beam systems in particle accelerators [llj , from 
computational ophthalmology [3S1[57] to vibration analysis [U] and control theory [TU] . 
Recent studies have shown that non-compact manifolds are better framed as pseudo- 
Riemannian manifolds rather than Riemannian-manifolds I22j. (The case of pseudo- 
Riemannian manifolds is not considered within the present manuscript, though.) 

The above-mentioned cases may be framed as manifold-valued data analysis. 

A problem of practical impact is the visualization of such high-dimensional data. When- 
ever the dimension of a manifold exceeds the number 3, direct visualization is impossible. 
In applications, the dimension of the manifolds that the data belong to is much higher than 
three. 

A possible solution to the visualization problem is provided by multidimensional scal- 
ing (MDS). Multidimensional scaling is a numerical technique whose goal is to find out a 
low-dimensional representation of high-dimensional abstract objects suitable for graphical 
representation. The aim of multidimensional scaling is to reduce the dimensionality of data 
while retaining their proximity properties. 

The aim of the present manuscript is to suggest the use of multidimensional scaling as a 
tool to visualize manifold-valued elements. Such a tool is intended as a support for testing 
and evaluating signal processing and machine learning algorithms insisting on manifolds. 

As MDS computes a set of coordinate vectors (in or M.^, for visualization purpose) 
whose distribution reflects a pattern of proximity, a key point in visualization is the pos- 
sibility to compute distances among objects on a manifold. For this reason, the present 
manuscript deals with Riemannian manifolds. 

The Section [2] of the present manuscript briefly reviews metric notions of Riemannian 
manifolds and the basic theory of MDS (as well as details on its implementation). Section |2] 
shows numerical examples about high-dimensional manifold- valued data visualization. 

2 Manifolds of interest and multidimensional scaling 

The present section reviews some metric notions of smooth manifolds and recalls the way 
that multidimensional scaling works, its fundamental properties and implementation details. 

2.1 Geodesic distance on Riemannian manifolds 

Let the dataspace of interest be denoted as X. It is supposed to be a Riemannian manifold. 
Its tangent space at point x S X is denoted as T^X. On a Euclidean space E, the distance 
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Symbol 


Description 


Dimension 


Inner product 


Distance function 


SO(q) 


Special orthogonal 
group: Manifold 
of continuous 
rotations 






d{x,y) ^ \/-tr(log2(a;Ty)) 




Unit 
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9-1 


< V, lu >x— tr{v'^ w) 


d{x, y) — arccos (x'^ y) £ [0 n]. 


S+(g) 


Manifold of 
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positive-definite 
matrices 


59(9 + 1) 


< v,w >x— tr{x~^vx^^w) 


d{x,y) ^ \/tr(log2(2;-iy)) 


SU(g) 


Special unitary 
group 


9^-1 


< v,w >x— tr(v^ w) 


d{x,y) - ^/-tT(\og^(x^y)) 



Table 1: Summary of the features of a few manifolds of interest in applications. Symbol e 
denotes identity matrix of appropriate size, operator tr(-) denotes matrix trace, symbol 
denotes ordinary transpose while symbol ^ denotes Hermitian transpose. 



between two points x,y £ K may be measured as d^{x,y) = \\x — y\\, where symbol || • || 
denotes the 2-norm. Such a distance is indeed the length of a straight line connecting 
endpoints x and y. Such a notion of distance may be extended to a generic Riemannian 
manifold via the notion of geodesic arcs (which generalize straight lines) and arc length on 
manifolds. 

A Riemannian manifold is equipped by a symmetric, positive-definite inner product. 
Given any pair of tangent vectors v,w £ T^H, their inner product is denoted by: 

{v,w)^eR. (1) 

Let Cx,v : [0 1] — > X denote a smooth curve on the manifold X such that Cx.v{0) = x G X 
and Cx,v(fi) = u G T^jX. The length of the curve Cx,v is given by: 

e{cx,v)= {cxAt),CxAt))i^^(t)dt- (2) 

Given arbitrary a; G X and v G T!j;X, the curve gx^v : [0 1] X of shortest length is termed 
geodesic. Normal parametrization is assumed here, namely the quantity {gx,v (t) , gx,v it))g^ ^ (t) 
keeps constant for every t £ [0 1]. Such minimal length, namely i{gx,v)^ termed geodesic 
distance between endpoints, namely between points gxA'^) ~ ^ ^i^*^ 9x,vi^)- The Rieman- 
nian distance between endpoints is denoted by: 

digx,viO), gx,v{'i-)f^^i{gx,v)- (3) 
Because of normal parametrization, it holds: 

i{9.,v) = (5x,.(0),5x,.(0))J^^(o) = {v,v)l. (4) 

The above setting may be extended to include pseudo-Riemannian manifolds. Also, 
more general metric spaces may be taken into account, although this exceeds the scope of 
the present paper. 

The table[T]summarizes the features of a few manifolds of interest in applications. Except 
for the unit hyper-sphere, the dimension of the manifolds of interest grows quickly. 
For a general reference on differential geometry, see |48j . 
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2.2 Multidimensional Scaling (MDS) 

One of the purposes of multidimensional scaling [5J [3H is to provide a visual representa- 
tion of the pattern of proximities among a set of high-dimensional objects. In this instance, 
MDS finds a set of vectors in the two-dimensional or three-dimensional Euclidean space such 
that the matrix of Euclidean distances among them corresponds - as closely as possible - 
to some function of the objects' proximity matrix according to a criterion function termed 
stress. 

Formally, let X be a high-dimensional Riemannian manifold with distance function d{-, •) 
and let {xk}k be a given collection of n elements of X. The aim of MDS is to determine 
a collection of n Euclidean coordinate-vectors {zk}k {zk G R^, with p — 2 oi p — 3) that 
replicates the pattern of proximities among the elements Xk- This may be achieved by 
minimizing the Kruskal stress function that may be written - for a generic Riemannian 
manifold - as; 

i j<i 

where symbol || • || denotes Euclidean norm. 

Alternatively, the Sammon stress function [5T] may be used to measure the discrep- 
ancy between the proximity pattern of the original data and the proximity pattern of their 
low-dimensional versions. The Sammon stress function may be written - for a generic Rie- 
mannian manifold - as: 

HU.hf^'^.^. — . E E -ii'-'t'"^'^^' - (6) 

The Sammon stress function differs from the Kruskal stress function in that the former 
emphasizes the relevance of distances that were originally small. 

Indeed, the Kruskal and the Sammon stress functions may be unified by the weighted 
stress function: 

$({2fc}fc)='MEE^*.j"(ii-^*"-^jH c^) 

i j<i 

where the quantities Wi_j — Wj,i > denote weights and M > denotes a normalization 
constant. In summary: 

• Kruskal stress function: M = 1, Wi,j = 1. 

• Sammon stress function: M — (j2iJ2j<i'^i^iT^j)j ' — d~^{xi, Xj). 

• Dwyer-Koren-Marriott stress function: In |T5], the weighting scheme Wi,j = 
dr'^{xi,Xj) and M = 1 are proposed. 

The correspondence x^ ^ induced by multidimensional scaling is not unique. In 
fact, if {zk}k is a minimizer set of the stress function, for a given dimensionality reduction 
problem, c G is a constant displacement vector and R G SO{p) is a p-dimensional 
rotation, also {Rzk + c}k is a minimizer set. 

Multidimensional scaling may be used as a proximity /similarity visualization tool for 
high-dimensional data as it computes two-dimensional or three-dimensional vectors Zk G K^, 
corresponding to the original elements Xk G X, that captures the fundamental information 
about mutual distances. 

The axes corresponding to the coordinates of the vectors Zk referred to as 'fictitious 
coordinates', do not possess any physical meaning, in general. All that matters in an MDS 
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map are the proximity properties. On a MDS-based visualization corresponding to a non- 
zero stress, the distances among objects are imperfect representations of the relationships 
among original data: The greater the stress, the greater the distortion. 

A key observation in the implementation of multidimensional scaling is that, rather than 
optimizing the actual stress function ([T]), it is more computationally convenient to optimize 
a quadratic majorization of it [5]. The following details on optimization of a majorizing 
function are drawn from Define the matrix A e R"^" as follows: 

= (8) 

^ I 'W^,s for l^J. 

Moreover, given a collection of n vectors {uk}k {uk £ MP), define the matrix C{{uk}k) as: 

c.A{^k}k) = \ "--"^11 , ^ f''^'; (9) 

\ -Y.si.^'^^,s[Auk\k) for^ = J. 

To ease the notation, define matrix Z g R"^'', whose rows coincide with the n coordinate- 
vectors and matrix U € R"-^p, whose rows coincide with the n coordinate- vectors u^. 
The stress function ([7|) is bounded from above by the quadratic form '^{Z; U) defined as: 

^{Z; [/) ^ w.,,d^x., X,) + Y^^ZlAZa - 2ZlC{U)Ua), (10) 

3<i 



where Za denotes the a*^ column of the matrix Z and Ua denotes the a*^ column of the 
matrix U {a = 1, . . . ,p). It holds <^(Z) < ^{Z; U) with equality holding when Z ~ U. In 
order to iteratively solve the optimization problem of minimizing the criterion ([7]) , one may 
use the following scheme: 

0. Set initial guess U, 

1. U arg min ^'(V^;[/), 

2. Repeat from 1 until done, 
Z. Z ^ U. 

It is immediate to verify that the optimization problem of minimizing the criterion (|10p 
is equivalent to p independent optimization problems, one for each axis. Consequently, the 
optimization problem (|10p may be reduced to the solution of p quadratic problems of the 
kind: 

ZlAZa-2ZlC{U)Ua. (11) 

As the matrix A is positive semi-definite, each of the above problem has only global minima. 
Moreover, note that the matrix A is constant across the p optimization problems. In order 
to iteratively solve the quadratic optimization problem: 

min [z'^Az + z^h), A > 0, 6 e R", (12) 
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one may use the following gradient steepest descent algorithm with line search 



0. Set initial guess x, 

1. g ^ 2Ax + b, 

2 s ^ 

2g^Ag' 

3. X X ~ sg, 

4. Repeat from 1 until done. 

As the matrix A is semi-definite, it is rank-deficient. As a consequence, the optimization 
problem (jl2p may not be solved in closed form and does not admit a unique solution. 

2.3 Spherical MDS 

As mentioned, multidimensional scaling is a method for embedding a general distance matrix 
into a low dimensional Euclidean space, used both as a pre-processing step as well as a 
visualization tool. 

There are also applications where the target space, rather than being a Euclidean space, 
is a smooth manifold. In particular, Spherical MDS is the problem of embedding a matrix of 
distances onto a (low-dimensional) sphere. Spherical MDS has applications in texture map- 
ping, image analysis as well as dimensionality reduction for finite dimensional distributions. 
A spherical dimensionality reduction method may have considerable impact in domains that 
represent data as histograms or distributions, such as in document processing and speech 
recognition [2]. 

An algorithmic framework for spherical multidimensional scaling has recently been de- 
scribed in [5]. 

3 Numerical Illustrative Examples 

The present section aims at illustrating the behavior of the MDS-based manifold-valued 
elements visualization tool. In all the following examples, the stress function proposed in 
[l2] is made use of. 

The first example aims at making the reader acquainted with the purpose of multidi- 
mensional scaling. It concerns the famous experiment with city-distances: The distance 
pattern among 9 cities inputs the MDS algorithm which computes the coordinates of 9 bi- 
dimensional points as shown in the Figure [TJ The figure shows that the MDS optimization 
algorithm tends to minimize the stress which indeed is a measure of discrepancy between the 
pattern of proximity among cities (9x9 = 81 distances shown on the upper-right panel) and 
the pattern of proximity of the coordinate vectors (81 distances shown on the lower- right 
panel). As an example, Los Angeles and San Francisco are close to each other and far from 
New York and the computed set of coordinate- vectors reflect this fact. 

The following two (toy) examples of MDS maps concern a distribution of random points 
in visualized on ffi.^. As the minimum distance problem ([7]) does not possess an unique 
solution, it is not reasonable to expect that a MDS map — t- is necessarily an identity. 

Figure [5] shows a result obtained with initial guess just slightly randomly shifted with 
respect to the actual points. In this case, the computed coordinate-points and the actual 
points coincide, he same hold true for the 10 x 10 matrices of actual and computed distances. 

^Calculations show that the optimal stepsize s given in |12| is incorrect of a factor 2. 



6 




50 100 150 200 20 40 60 80 



Figure 1: Example on city-distances. Upper-left panel: Locations of the computed 
coordinate-points (labeled for clarity). Lower- left panel: Stress function during iteration 
normalized to initial stress function (ratio expressed in decibels). Upper-right and lower- 
right panels: Pattern of distances among actual points and computed coordinate-points, 
respectively. 

The same experiment was repeated with a random initial guess: Figure [3] shows that, 
although the set of coordinate-points replicate the distance pattern of the actual points, 
their locations are seemingly unrelated with those of the actual points. However, points 
that are actually close to each other (for example, points marked as 3 and 9) keep close to 
each other in the computed representation. 

A further example concerns the bi-dimensional visualization of a distribution of points 
on a three-dimensional unit sphere. Figure S] shows the actual points on a three-dimensional 
spherical surface, labeled for clarity. 

Figure [5] shows a obtained result. The points that are close to each other on the sphere 
(for example, points 4 and 13) are close to each other on the bi-dimensional plane too. 

The MDS-based visualization tool may be profitably used to inspect the trajectory of 
the manifold- valued state of a signal processing algorithm. Figure [S] refers to the trajectory 
of a non-negative independent component analysis (NNICA) algorithm [T5| which learns 
on SO (9). Every time-step of the algorithm generates a 9 x 9 orthogonal matrix with -1-1- 
determinant that may be parameterized with no less than 9x8/2 — 36 angles, whose 
visualization 'as is' is impossible. The present visualization by multidimensional scaling is 
based on a map SO(9) — ?> M'^. The NNICA algorithm run through 200 time-steps, down- 
sampled to 20 for visual tidiness (initial point labeled as '1', final point labeled as '20'). The 
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Figure 2: Toy experiment on finding a — >■ MDS map starting from a siiglitiy randomiy 
shifted initial guess. Upper-left panel: Locations of the actual points (diamond) and the 
computed coordinate-points (open circles). Note that the points are numbered for clarity. 
Lower-left panel: Stress function during iteration normalized to initial stress function (ratio 
expressed in decibels). Upper- right and lower- right panels: Pattern of distances among 
actual points and computed coordinate-points, respectively. 

succession of points clearly evidences a convergent adaptation trajectory. 

4 Conclusion 

The aim of the present contribution was to suggest the use of multidimensional scaling, a 
well-known dimensionality reduction technique, as a visualization tool for manifold-valued 
data. 

Visualization tools are useful in signal processing and machine learning as they help 
inspecting the distribution of high-dimensional vectors and matrices. The MDS-based visu- 
alization tool suggested in the present paper captures the pattern of proximity among high- 
dimensional manifold- valued elements and computes a set of 2-dimensional or 3-dimensional 
vectors that retain the same pattern of proximity. 
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Figure 3: Toy experiment on finding a — >■ MDS map starting from a random ini- 
tial guess. Upper-left panel: Locations of the actual points (diamond) and the computed 
coordinate-points (open circles). Note that the points are numbered for clarity. Lower- left 
panel: Stress function during iteration normalized to initial stress function (ratio expressed 
in decibels). Upper-right and lower-right panels: Pattern of distances among actual points 
and computed coordinate-points, respectively. 
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